Automatic Subspace Clustering of High Dimensional Data for DataMining
نویسندگان
چکیده
Data mining applications place special requirements on clustering algorithms including: the ability to nd clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisses each of these requirements. CLIQUE identiies dense clusters in subspaces of maximum dimen-sionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any speciic mathematical form for data distribution. Through experiments, we show that CLIQUE eeciently nds accurate clusters in large high dimensional datasets.
منابع مشابه
Subspace clustering with automatic feature grouping
This paper proposes a subspace clustering algorithm with automatic feature grouping for clustering high-dimensional data. In this algorithm, a new component is introduced into the objective function to capture the feature groups and a new iterative process is defined to optimize the objective function so that the features of high-dimensional data are grouped automatically. Experiments on both s...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملSubspace outlier mining in large multimedia databases
Increasingly large multimedia databases in life sciences, ecommerce, or monitoring applications cannot be browsed manually, but require automatic knowledge discovery in databases (KDD) techniques to detect novel and interesting patterns. Clustering, aims at grouping similar objects into clusters, separating dissimilar objects. Density-based clustering has been shown to detect arbitrarily shaped...
متن کاملFinding and Visualizing Subspace Clusters of High Dimensional Dataset Using Advanced Star Coordinates
Analysis of high dimensional data is a research area since many years. Analysts can detect similarity of data points within a cluster. Subspace clustering detects useful dimensions in clustering high dimensional dataset. Visualization allows a better insight of subspace clusters. However, displaying such high dimensional database clusters on the 2-dimensional display is a challenging task. We p...
متن کاملAutomatic motion capture data denoising via filtered subspace clustering and low rank matrix approximation
In this paper, we present an automatic Motion Capture (MoCap) data denoising approach via filtered subspace clustering and low rank matrix approximation. Within the proposed approach, we formulate the MoCap data denoising problem as a concatenation of piecewise motion matrix recovery problem. To this end, we first present a filtered subspace clustering approach to separate the noisy MoCap seque...
متن کامل